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Equation Solvars for Distributor! Momory Computors 


by Olaf O. Storaasli, (O.O.Storaasli@Iarc.nasa.gov or 804-864-2927) 


for Workshop on the Role of Computers in Langley R&D (6-15-94) 

A large number of scientific and engineering problems reduce to the solution of 
large systems of simultaneous equations. Solving large systems of simultaneous 
equations rapidly thus makes the solution of large-scale structures, physics 
electromagnetics and fluid mechanics problems tractable. The performance of 
parallel computers now dwarfs traditional vector computers by nearly an order of 
magnitude, so the challenge is to rapidly solve large systems of equations rapidly 
on the new breed of scalable parallel processing supercomputers. 

Research at Langley on solving equations on distributed memory computers aoes 
back nearly ten years to the Langley Finite Element Machine, one of the nation's 
first parallel computers with 32 processors developed by NASA before 
commercial parallel computers were available. Since then, both iterative and 
direct parallel equation solvers have been developed and tuned for parallel 
computers manufactured by Flexible computer, N-Cube, Alliant, Encore, Cray 
Intel, Convex and IBM. The solvers, PVSOLVE and PVS-MP are currently running 
on the IBM SP-1 and SP-2 under a Memorandum of Agreement with IBM which 
permits Langley early access to the SP-1 and SP-2 in return for IBM given 
permission to use the NASA solvers for advertisements, demonstrations, and 
trade shows. These Langley solvers are timely since in a recent $22.4 million 
procurement, two IBM SP-2 supercomputers will be delivered to NASA (160 
processors to NAS and 48 processors to LaRC). Based on benchmarks and the 
Langley parallel equation solvers, these IBM supercomputers promise to surpass 
the performance of traditional Cray vector supercomputers and other parallel 
computers. 


The talk will describe the major issues involved in parallel equation solvers with 
particular emphasis on implementations the Intel Paragon, IBM SP-1 and SP-2. 
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Objective 


Faster, cheaper, better analysis/design 
of large-scale structures 

- Develop algorithms to exploit 

distributed-memory computers 

- Evaluate computational performance 
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Outline 





• Distributed-memory Computers 

• Structural Applications 

• Structural Analysis 

• - Nodal Generation and Assembly 

• “ Linear Equation Solvers 

• Structural Optimization 

• x-Design Sensitivity 
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1994 Distributed-Memory 
Supercomputers 


IBM SP-2 


Intel Paragon 



Being installed this summer at 
NAS (160 proc) 
and LaRC ( 48 proc) 
266 MFLOPS/proc peak 



Current world record hni^ r t 
143 GigaFLOPS for MP-Linpack 
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Parallel Structural Matrix Generator/ 
Assembler Demonstrated on HSCT 


Nearly ideal parallel speedup 

(no interprocessor communication) 
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• Iterative slow, convergence not guaranteed 

• Direct complex coding (banded, sparse) 


Mach 2.4 HSCT 
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I Paragon Status | 


# OSF ReV "I .I (Latency; 150 -> 85 jisec, Tools 
Communication: 11 <•> 34 MB/sec, Memory 8 -> 6MB) 


• OSF Rev 1 .2 (Latency: 85 -> 50 jisec 

Communication: 34 -> 55 MB/sec) 

• New comm Chip: tested at 400 MB/sec 

• Dynamic Memory: avoid inconsistencies 

(i.e. faster 2nd runs) 





SUNMOS: Sandia-UNM O/S 

(Latency: 24 \isec, Comm: 175 MB/sec, Mem: 0.3MB) 
178 MB/sec on Grace (NAS benchmarks run faster) 
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Interprocessor Communication 
Methods 



Broadcast 

(widely used) 


Ring 
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• Ring communication reduced solution time 

• S lower than 1 Cray processor 


Mach 2.4 HSCT Displacements 
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7,868 Elements 
16,152 Equations 
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Solution Time Breakdown 

- Mach 3 .0 HSCT - 

• Communication dominates 

• Computation scalable (< C-90) 
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7,868 Triangular Elements 

1 Design Variable (skin thickness) 

[□Hand coded aADIFOR laFinite Differenct 


Time 100 
(sec.) 
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SESSION 5 Automatic Differentiation 
Chaired by 
Olaf Storaasli 


5.1 


Applications of Automatic Differentiation in Computational Fluid Dynamics - 
Larry Green J 


5.2 Automatic Differentiation for Design Sensitivity Analysis of Structural Systems 
Using Multiple Processors - Due Nguyen, Olaf Storaasli, Jiangning Qin and 
Ramzi Qamar 


167 



